IBM6250 Group Project - Coffee Vending

Author

Group 1

Order Task / Deliverable Owner Details
1 Input / Intro section Jarrod Project overview and report framing
2 Exploratory Data Analysis (EDA) Min Data cleaning, visualization, descriptive stats
3 Modeling Eunice Build and validate predictive models
4 Forecast Ceren Predict overall revenue, drink consumption, and ingredient usage

Introduction

Effective inventory control for coffee-vending machines hinges on anticipating weekly ingredient consumption while avoiding costly spoilage. We forecast demand using historical sales from two machines, delivering eight-week projections that guide stock levels and reorder cadence.

This report:

  1. Imports & cleans transaction data from two coffee-vending machines.
  2. Explores key demand drivers.
  3. Models weekly sales with Seasonal ARIMA (plus Prophet as a benchmark).
  4. Delivers eight-week forecasts and stocking recommendations.

Data Input and Combining

Kaggle data is from two vending machines. Below we will import the two datasets and combine them.

Raw Transaction Data

Transaction data was taken from the following Kaggle link:

https://www.kaggle.com/datasets/ihelon/coffee-sales

machine1 <- read_csv("data/index_1.csv") %>% 
  mutate(machine_id = "machine1")

sales <- machine1 |>
  mutate(date = as_date(date),
         datetime = as_datetime(datetime),
         coffee_name=toupper(coffee_name))


skim(sales)
Data summary
Name sales
Number of rows 3636
Number of columns 7
_______________________
Column type frequency:
character 4
Date 1
numeric 1
POSIXct 1
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
cash_type 0 1.00 4 4 0 2 0
card 89 0.98 19 19 0 1316 0
coffee_name 0 1.00 5 19 0 8 0
machine_id 0 1.00 8 8 0 1 0

Variable type: Date

skim_variable n_missing complete_rate min max median n_unique
date 0 1 2024-03-01 2025-03-23 2024-10-06 381

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
money 0 1 31.75 4.92 18.12 27.92 32.82 35.76 40 ▁▃▅▃▇

Variable type: POSIXct

skim_variable n_missing complete_rate min max median n_unique
datetime 0 1 2024-03-01 10:15:50 2025-03-23 18:11:38 2024-10-07 02:55:12 3636
unique(machine1$coffee_name)
[1] "Latte"               "Hot Chocolate"       "Americano"          
[4] "Americano with Milk" "Cocoa"               "Cortado"            
[7] "Espresso"            "Cappuccino"         

Products and Ingredients

In the dataset, only product names are given. In order to more accurately predict what ingredients are needed and when, we must decompose the product into its ingredients. See below for the assumptions made for each of the 8 unique products.

Unique Products

unique(machine1$coffee_name)%>%sort()
[1] "Americano"           "Americano with Milk" "Cappuccino"         
[4] "Cocoa"               "Cortado"             "Espresso"           
[7] "Hot Chocolate"       "Latte"              

Ingredients

recipes <- tribble(
  ~coffee_name,            ~coffeeG, ~milkML, ~chocolateG, ~caramelML, ~sugarG, ~vanillaML,
  "AMERICANO",                18,       0,          0,          0,        0,        0,
  "AMERICANO WITH MILK",      18,      60,          0,          0,        0,        0,
  "CAPPUCCINO",               18,      60,          0,          0,        0,        0,
  "COCOA",                     0,     240,         22,          0,       15,        0,
  "CORTADO",                  18,      60,          0,          0,        0,        0,
  "ESPRESSO",                 18,       0,          0,          0,        0,        0,
  "HOT CHOCOLATE",             0,     240,         30,          0,       20,        0,
  "LATTE",                    18,     240,          0,          0,        0,       10
)
Ingredient Logic
Drink Ingredient-logic rationale
Espresso Straight double shot: 18 g ground coffee, no additives (Specialty Coffee Association 2018).
Americano Same 18 g espresso diluted with ≈ 4 × its volume of hot water; nothing else required (Specialty Coffee Association 2018).
Americano with Milk Americano softened with ≈ 60 ml steamed milk – enough to mellow bitterness without turning it into a latte (Cordell 2024).
Cappuccino Classic 1 : 1 : 1 build – espresso, ≈ 60 ml steamed milk, equal micro-foam – fills a 150–180 ml cup (Raffii 2024).
Cortado Spanish “cut” drink: equal parts double espresso and ≈ 60 ml steamed milk (Wine Editors 2025).
Latte U.S. latte stretches the shot with ≈ 240 ml milk (1 : 4–5 ratio); vanilla version adds 10 ml syrup (≈ 2 pumps) (Coffee Bros. 2024; Page 2025).
Cocoa Non-coffee mix: 240 ml milk + 22 g cocoa powder + 15 g sugar – standard stovetop proportions (Hersheyland Test Kitchen 2025).
Hot Chocolate Richer café blend: same milk but 30 g cocoa and 20 g sugar for modern sweetness level (Hersheyland Test Kitchen 2025).

Combining Transaction Data and Recipies

Below we will join the two tables on the coffee name, which will add ingredients to all rows in the transaction data. Explore the data we will use in our analysis below:

sales_ingredients <- sales |>
  left_join(recipes, by = "coffee_name") |>
  replace_na(list(
    coffee = 0, milk = 0, chocolate = 0, caramel = 0,
    whiskey = 0, tea = 0, vanilla = 0
  ))

Converting to Weekly Series

We aggregate to a weekly time series because the business decisions we are informing, like re-ordering coffee, milk, chocolate, etc, are made on a weekly cadence. Collapsing daily transactions into weeks smooths out erratic, day-to-day swings leaving a cleaner signal that aligns directly with the quantity we must predict.

We will also convert to a time series type object and verify it has no gaps in the series. If we see FALSE from .gaps, then we have no gaps.

weekly_sales <- sales_ingredients |>
  mutate(week = lubridate::floor_date(date, unit = "week")) |>
  group_by(week) |>
  summarise(across(coffeeG:vanillaML, sum, na.rm = TRUE),
            sales_n = n()) |>
  ungroup()

weekly_sales <- weekly_sales|>
  as_tsibble(index = week)

has_gaps(weekly_sales)
# A tibble: 1 × 1
  .gaps
  <lgl>
1 FALSE

Final Data for use in Analysis

Weekly ingredient demand vs. cups sold

Exploratory Data Analysis(EDA)

#weekly total sales per machine
weekly_sales_per_machine <- sales_ingredients |>
  mutate(week = lubridate::floor_date(date, unit = "week")) |>
  group_by(machine_id, week) |>
  summarise(across(coffeeG:vanillaML, sum, na.rm = TRUE),
            sales_n = n(),
            .groups = "drop") |>
  as_tsibble(index = week, key = machine_id)

weekly_sales_per_machine %>% 
  autoplot(sales_n)+
  labs(title = "Weekly Total Sales per Machine",
       y = "Number of Sales",
       x = "Week") +
  guides(color = guide_legend(title = "Machine ID")) +
  theme_minimal()

#Decomposition of Weekly Ingredient Usage
decomposed <- weekly_long %>%
  filter(metric != "sales_n") %>%
  model(STL(value)) %>%
  components()


autoplot(decomposed) +
  facet_wrap(~metric, scales = "free_y") +
  labs(title = "Decomposition of Weekly Ingredient Usage",
       x = "Week",
       y = NULL) +
  theme_minimal()+
  theme(legend.position = "none")+
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

#checking Autocorrelation of each ingredients
weekly_long %>%                       # your original data-frame  
  filter(metric != "sales_n") %>%     # keep the same exclusion
  group_by(metric) %>%                # work metric-by-metric
  group_walk(~{
    v <- .x$value
    v <- v[is.finite(v)]             # drop NA / Inf

    if (length(v) < 2L || var(v) == 0) {
      message("Skipping ", .y$metric, ": not enough finite variation.")
    } else {
      acf(v,
          main = paste("ACF for", .y$metric),
          na.action = na.pass)        # keeps plotting even if a few NA remain
    }
  })

#Differencing each ingredients
weekly_diff <- weekly_long %>%
  filter(metric != "sales_n") %>%
  group_by(metric) %>%  
  arrange(week) %>%  
  mutate(value_diff = difference(value)) %>%  
  filter(!is.na(value_diff)) %>%  
  ungroup()



#ACF for Differenced Ingredients
weekly_diff %>%                          # your differenced data frame
  group_by(metric) %>%                   # handle one metric at a time
  group_walk(~{
    v <- .x$value_diff                   # pull the differenced vector

    # 1. keep only finite, non-missing observations
    v <- v[is.finite(v)]

    # 2. check that we still have at least two values AND some variance
    if (length(v) < 2L || var(v) == 0) {
      message("Skipping ", .y$metric,
              ": not enough finite variation after differencing.")
    } else {
      acf(v,
          lag.max = 30,
          main = paste("ACF for Differenced", .y$metric),
          na.action = na.pass)           # ignore any intermittent NA gaps
    }
  })

Modeling Process

Model Evaluation and Diagnostics

Generate Forecasts

Executive Summary with Actionable Recommendations

Appendix

Machine 1

Machine 1 Weekly ingredient demand vs. cups sold

References

Coffee Bros. 2024. “Milk-to-Espresso Ratio Calculator.” 2024. https://coffeebros.com/pages/milk-to-espresso-ratio-calculator.
Cordell, George. 2024. “Americano Coffee with Milk Recipe.” 2024. https://coffeelikers.com/americano-coffee-with-milk/.
Hersheyland Test Kitchen. 2025. “Hot Cocoa for One.” 2025. https://www.hersheyland.com/recipes/hot-cocoa-for-one.html.
Page, Amazon Product. 2025. “Torani Vanilla Syrup Pump – 8 Ml Per Pump.” 2025. https://www.amazon.com/dp/B09P49HWKK.
Raffii, Ahlam. 2024. “How to Make the Perfect Cappuccino.” 2024. https://www.thespruceeats.com/how-to-make-cappuccinos-766116.
Specialty Coffee Association. 2018. “Defining the Ever-Changing Espresso.” https://sca.coffee/sca-news/25-magazine/issue-3/defining-ever-changing-espresso-25-magazine-issue-3.
Wine Editors, Food &. 2025. “8 Types of Coffee Drinks to Order Around the World.” 2025. https://www.foodandwine.com/types-of-coffee-drinks-11724561.